medical history
Chinese Discharge Drug Recommendation in Metabolic Diseases with Large Language Models
Li, Juntao, Yuan, Haobin, Luo, Ling, Jiang, Yan, Wang, Fan, Zhang, Ping, Lv, Huiyi, Wang, Jian, Sun, Yuanyuan, Lin, Hongfei
I ntelligent drug recommendation based on Electronic Health Records (EHRs) is critical for improving the quality and efficiency of clinical decision - making . By leveraging large - scale patient data, drug recommendation systems can assist physicians in selecting the most appropriate medications according to a patient's medical history, diagnoses, laboratory results, and comorbidities. Recent advances in large language models (LLMs) have shown remarkable capabilities in complex reasoning and medical text understanding, making them promising tools for drug recommendation tasks. However, the application of LLMs for Chinese clinical medication recommendation remains l argely unexplored. In this work, we conduct a systematic investigation of LLM - based methodologies for Chinese discharge medication recommendation . W e evaluate several representative LLM families (GLM, Llama, Qwen) under a unified methodological framework including zero - shot prompting, in - context learning, chain - of - thought prompting, and supervised fine - tuning using LoRA. W e analyze model behavior acro ss reasoning styles, error patterns, domain adaptation mechanisms, and robustness . Experimental results show that while supervised fine - tuning improves model performance, there remains substantial room for improvement, with the best model achieving the F1 score of 0.5648 and Jaccard score of 0.4477 . Our findings highlight both the potential and limitations of LLMs for Chinese drug recommendation.
- Asia > China > Liaoning Province > Dalian (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Massachusetts (0.04)
- (3 more...)
Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR
Gao, Jifan, Rosenthal, Michael, Wolpin, Brian, Cristea, Simona
Structured electronic health records (EHR) are essential for clinical prediction. While count-based learners continue to perform strongly on such data, no benchmarking has directly compared them against more recent mixture-of-agents LLM pipelines, which have been reported to outperform single LLMs in various NLP tasks. In this study, we evaluated three categories of methodologies for EHR prediction using the EHRSHOT dataset: count-based models built from ontology roll-ups with two time bins, based on LightGBM and the tabular foundation model TabPFN; a pretrained sequential transformer (CLMBR); and a mixture-of-agents pipeline that converts tabular histories to natural-language summaries followed by a text classifier. We assessed eight outcomes using the EHRSHOT dataset. Across the eight evaluation tasks, head-to-head wins were largely split between the count-based and the mixture-of-agents methods. Given their simplicity and interpretability, count-based models remain a strong candidate for structured EHR benchmarking. The source code is available at: https://github.com/cristea-lab/Structured_EHR_Benchmark.
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.66)
PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions
Kyung, Daeun, Chung, Hyunseung, Bae, Seongsu, Kim, Jiho, Sohn, Jae Ho, Kim, Taerim, Kim, Soo Kyung, Choi, Edward
Doctor-patient consultations require multi-turn, context-aware communication tailored to diverse patient personas. Training or evaluating doctor LLMs in such settings requires realistic patient interaction systems. However, existing simulators often fail to reflect the full range of personas seen in clinical practice. To address this, we introduce PatientSim, a patient simulator that generates realistic and diverse patient personas for clinical scenarios, grounded in medical expertise. PatientSim operates using: 1) clinical profiles, including symptoms and medical history, derived from real-world data in the MIMIC-ED and MIMIC-IV datasets, and 2) personas defined by four axes: personality, language proficiency, medical history recall level, and cognitive confusion level, resulting in 37 unique combinations. We evaluate eight LLMs for factual accuracy and persona consistency. The top-performing open-source model, Llama 3.3 70B, is validated by four clinicians to confirm the robustness of our framework. As an open-source, customizable platform, PatientSim provides a reproducible and scalable solution that can be customized for specific training needs. Offering a privacy-compliant environment, it serves as a robust testbed for evaluating medical dialogue systems across diverse patient presentations and shows promise as an educational tool for healthcare. The code is available at https://github.com/dek924/PatientSim.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > Texas > Coleman County (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
SARHAchat: An LLM-Based Chatbot for Sexual and Reproductive Health Counseling
Yang, Jiaye, Zhao, Xinyu, Chen, Tianlong, Brennan, Kandyce
While Artificial Intelligence (AI) shows promise in healthcare applications, existing conversational systems often falter in complex and sensitive medical domains such as Sexual and Reproductive Health (SRH). These systems frequently struggle with hallucination and lack the specialized knowledge required, particularly for sensitive SRH topics. Furthermore, current AI approaches in healthcare tend to prioritize diagnostic capabilities over comprehensive patient care and education. Addressing these gaps, this work at the UNC School of Nursing introduces SARHAchat, a proof-of-concept Large Language Model (LLM)- based chatbot. SARHAchat is designed as a reliable, user-centered system integrating medical expertise with empathetic communication to enhance SRH care delivery. Our evaluation demonstrates SARHAchat's ability to provide accurate and contextually appropriate contraceptive counseling while maintaining a natural conversational flow. The demo is available at https://sarhachat.com/.
- North America > United States > North Carolina > Orange County > Chapel Hill (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- Asia > Middle East > Jordan (0.05)
Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records
Nahian, Md Sultan Al, Delcher, Chris, Harris, Daniel, Akpunonu, Peter, Kavuluru, Ramakanth
-- The ability to predict drug overdose risk from a patient's medical records is crucial for timely intervention and prevention. Traditional machine learning models have shown promise in analyzing longitudinal medical records for this task. However, recent advancements in large language models (LLMs) offer an opportunity to enhance prediction performance by leveraging their ability to process long textual data and their inherent prior knowledge across diverse tasks. In this study, we assess the effectiveness of Open AI's GPT -4o LLM in predicting drug overdose events using patients' longitudinal insurance claims records. We evaluate its performance in both fine-tuned and zero-shot settings, comparing them to strong traditional machine learning methods as baselines. Our results show that LLMs not only outperform traditional models in certain settings but can also predict overdose risk in a zero-shot setting without task-specific training. Drug overdose (OD) is a major public health crisis in the United States, leading to a substantial number of emergency medical interventions and fatalities each year. According to the Centers for Disease Control and Prevention (CDC), drug overdoses claimed approximately 107,941 [1] lives in the U.S. in 2022, highlighting the urgent need for effective prevention and intervention strategies. Besides fatal outcomes and lost quality of life for patients, the misuse of prescription medications, illicit drugs, and polysubstance abuse has placed an immense burden on healthcare systems, emergency responders, and policymakers. Identifying individuals at risk early can facilitate timely interventions, such as targeted clinical assessments, behavioral support, and prescription monitoring, thereby reducing the likelihood of fatal outcomes. Md Sultan Al Nahian is with the Institute for Biomedical Informatics, University of Kentucky, Lexington, KY 40536 USA. Chris Delcher and Daniel Harris are with the Department of Pharmacy Practice and Science, University of Kentucky, Lexington, KY 40536 USA. Peter Akpunonu is with the Department of Emergency Medicine, University of Kentucky, Lexington, KY 40536 USA.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology
Liu, Dou, Long, Ying, Zuoqiu, Sophia, Tang, Tian, Yin, Rong
Effective physician-patient communications in pre-diagnostic environments, and most specifically in complex and sensitive medical areas such as infertility, are critical but consume a lot of time and, therefore, cause clinic workflows to become inefficient. Recent advancements in Large Language Models (LLMs) offer a potential solution for automating conversational medical history-taking and improving diagnostic accuracy. This study evaluates the feasibility and performance of LLMs in those tasks for infertility cases. An AI-driven conversational system was developed to simulate physician-patient interactions with ChatGPT-4o and ChatGPT-4o-mini. A total of 70 real-world infertility cases were processed, generating 420 diagnostic histories. Model performance was assessed using F1 score, Differential Diagnosis (DDs) Accuracy, and Accuracy of Infertility Type Judgment (ITJ). ChatGPT-4o-mini outperformed ChatGPT-4o in information extraction accuracy (F1 score: 0.9258 vs. 0.9029, p = 0.045, d = 0.244) and demonstrated higher completeness in medical history-taking (97.58% vs. 77.11%), suggesting that ChatGPT-4o-mini is more effective in extracting detailed patient information, which is critical for improving diagnostic accuracy. In contrast, ChatGPT-4o performed slightly better in differential diagnosis accuracy (2.0524 vs. 2.0048, p > 0.05). ITJ accuracy was higher in ChatGPT-4o-mini (0.6476 vs. 0.5905) but with lower consistency (Cronbach's $\alpha$ = 0.562), suggesting variability in classification reliability. Both models demonstrated strong feasibility in automating infertility history-taking, with ChatGPT-4o-mini excelling in completeness and extraction accuracy. In future studies, expert validation for accuracy and dependability in a clinical setting, AI model fine-tuning, and larger datasets with a mix of cases of infertility have to be prioritized.
- Asia > China > Sichuan Province > Chengdu (0.05)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Asia > India (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways
Chen, Zhen, Peng, Zhihao, Liang, Xusheng, Wang, Cheng, Liang, Peigan, Zeng, Linsheng, Ju, Minjie, Yuan, Yixuan
Inpatient pathways demand complex clinical decision-making based on comprehensive patient information, posing critical challenges for clinicians. Despite advancements in large language models (LLMs) in medical applications, limited research focused on artificial intelligence (AI) inpatient pathways systems, due to the lack of large-scale inpatient datasets. Moreover, existing medical benchmarks typically concentrated on medical question-answering and examinations, ignoring the multifaceted nature of clinical decision-making in inpatient settings. To address these gaps, we first developed the Inpatient Pathway Decision Support (IPDS) benchmark from the MIMIC-IV database, encompassing 51,274 cases across nine triage departments and 17 major disease categories alongside 16 standardized treatment options. Then, we proposed the Multi-Agent Inpatient Pathways (MAP) framework to accomplish inpatient pathways with three clinical agents, including a triage agent managing the patient admission, a diagnosis agent serving as the primary decision maker at the department, and a treatment agent providing treatment plans. Additionally, our MAP framework includes a chief agent overseeing the inpatient pathways to guide and promote these three clinician agents. Extensive experiments showed our MAP improved the diagnosis accuracy by 25.10% compared to the state-of-the-art LLM HuatuoGPT2-13B. It is worth noting that our MAP demonstrated significant clinical compliance, outperforming three board-certified clinicians by 10%-12%, establishing a foundation for inpatient pathways systems.
- Asia > China > Hong Kong (0.04)
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators
Liu, Zhaocheng, Tu, Quan, Ye, Wen, Xiao, Yu, Zhang, Zhishou, Cui, Hengfu, Zhu, Yalun, Ju, Qiang, Li, Shizheng, Xie, Jian
Online medical consultation (OMC) restricts doctors to gathering patient information solely through inquiries, making the already complex sequential decision-making process of diagnosis even more challenging. Recently, the rapid advancement of large language models has demonstrated a significant potential to transform OMC. However, most studies have primarily focused on improving diagnostic accuracy under conditions of relatively sufficient information, while paying limited attention to the "inquiry" phase of the consultation process. This lack of focus has left the relationship between "inquiry" and "diagnosis" insufficiently explored. In this paper, we first extract real patient interaction strategies from authentic doctor-patient conversations and use these strategies to guide the training of a patient simulator that closely mirrors real-world behavior. By inputting medical records into our patient simulator to simulate patient responses, we conduct extensive experiments to explore the relationship between "inquiry" and "diagnosis" in the consultation process. Experimental results demonstrate that inquiry and diagnosis adhere to the Liebig's law: poor inquiry quality limits the effectiveness of diagnosis, regardless of diagnostic capability, and vice versa. Furthermore, the experiments reveal significant differences in the inquiry performance of various models. To investigate this phenomenon, we categorize the inquiry process into four types: (1) chief complaint inquiry; (2) specification of known symptoms; (3) inquiry about accompanying symptoms; and (4) gathering family or medical history. We analyze the distribution of inquiries across the four types for different models to explore the reasons behind their significant performance differences. We plan to open-source the weights and related code of our patient simulator at https://github.com/LIO-H-ZEN/PatientSimulator.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Asia > Taiwan (0.04)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
AI chatbots fail to diagnose patients by talking with them
Advanced artificial intelligence models score well on professional medical exams but still flunk one of the most crucial physician tasks: talking with patients to gather relevant medical information and deliver an accurate diagnosis. "While large language models show impressive results on multiple-choice tests, their accuracy drops significantly in dynamic conversations," says Pranav Rajpurkar at Harvard University. That became evident when researchers developed a method for evaluating a clinical AI model's reasoning capabilities based on simulated doctor-patient conversations. The "patients" were based on 2000 medical cases primarily drawn from professional US medical board exams. "Simulating patient interactions enables the evaluation of medical history-taking skills, a critical component of clinical practice that cannot be assessed using case vignettes," says Shreya Johri, also at Harvard University.
PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation
Bao, Zhijie, Liu, Qingyun, Guo, Ying, Ye, Zhengqiang, Shen, Jun, Xie, Shirong, Peng, Jiajie, Huang, Xuanjing, Wei, Zhongyu
In China, receptionist nurses face overwhelming workloads in outpatient settings, limiting their time and attention for each patient and ultimately reducing service quality. In this paper, we present the Personalized Intelligent Outpatient Reception System (PIORS). This system integrates an LLM-based reception nurse and a collaboration between LLM and hospital information system (HIS) into real outpatient reception setting, aiming to deliver personalized, high-quality, and efficient reception services. Additionally, to enhance the performance of LLMs in real-world healthcare scenarios, we propose a medical conversational data generation framework named Service Flow aware Medical Scenario Simulation (SFMSS), aiming to adapt the LLM to the real-world environments and PIORS settings. We evaluate the effectiveness of PIORS and SFMSS through automatic and human assessments involving 15 users and 15 clinical experts. The results demonstrate that PIORS-Nurse outperforms all baselines, including the current state-of-the-art model GPT-4o, and aligns with human preferences and clinical needs. Further details and demo can be found at https://github.com/FudanDISC/PIORS
- Research Report > Experimental Study (0.67)
- Research Report > Promising Solution (0.48)
- Health & Medicine > Health Care Providers & Services (1.00)
- Health & Medicine > Consumer Health (0.93)
- Education > Educational Setting > Higher Education (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)